Final cleaning and linking of AKB IDs to Yambol Features
The script explores the consistency of datatypes and feature attributes created in previous scripts, streamlines loose ends, and marries the records to their AKB counterparts. Final data are clipped to Yambol boundary and spatial duplicates are identified and filtering facilitated by adding logical campaign-indicating columns. This script can be run standalone and is final. There is no need to run previous scripts unless one wishes to explore the data in their raw state or change some of the processing.
What is AKB
AKB stands for the “Arheologicheska Karta na Bulgaria”, national digital register, where all archaeological sites and places of interest are entered upon their first investigation. The purpose of the present script is to add AKB identifiers assembled by Todor Valchev to the Yambol field observations. The biggest benefit of this exercise is that the AKB reference lets us see if a given mound was excavated and dated. It also allows users with access to AKB to look up the full cultural heritage information associated with the record.
Sites are often entered into AKB upon their first visit, with additional data added upon revisit, excavations, or other analyses.
The AKB link is one reason why you may want to filter the Yambol data
for one or another spatial versions: Upto2010 and
Post2010. While the Post2010 is based on later
revisits and thus more recent and authoritative (especially if driven by
the need to update or improve the record), the Upto2010
record IDs were use to create links in AKB and so are more suitable for
mapping of cultural heritage data in AKB with the environmental
variables present here.
Load data - master_sp_enriched
Here we load aggregated and enriched features from Yambol and wider surroundings.
# not deduplicated for final dataset
features <- st_read("../output_data/interim/master_sp_enriched.geojson")Reading layer `master_sp_enriched' from data source
`C:\Users\Adela\Documents\RStudio\MoundMerging2023\output_data\interim\master_sp_enriched.geojson'
using driver `GeoJSON'
Simple feature collection with 1484 features and 23 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 420515 ymin: 4647347 xmax: 499612.3 ymax: 4722257
Projected CRS: WGS 84 / UTM zone 35N
Validation I: Check how many of the features are actually mounds
Simple feature collection with 9 features and 2 fields
Geometry type: GEOMETRY
Dimension: XY
Bounding box: xmin: 420515 ymin: 4647347 xmax: 499612.3 ymax: 4722257
Projected CRS: WGS 84 / UTM zone 35N
# A tibble: 9 × 3
Type n geometry
<chr> <int> <GEOMETRY [m]>
1 Burial Mound 1024 MULTIPOINT ((420515 4693580), (420707.1 4690851),…
2 Burial Mound? 91 MULTIPOINT ((422391.7 4698832), (429156.7 4695478…
3 Extinct Burial Mound 188 MULTIPOINT ((420832.5 4689836), (421875.6 4693594…
4 Extinct Burial Mound? 10 MULTIPOINT ((468291.5 4660440), (469551.2 4669012…
5 Other 71 MULTIPOINT ((426968.4 4693384), (462171.2 4655463…
6 Surface Scatter 64 MULTIPOINT ((421045.4 4690265), (424207.4 4694969…
7 Tell 3 MULTIPOINT ((465676.3 4675875), (467848.2 4662406…
8 Tell? 1 POINT (483400.7 4697154)
9 Uncertain Feature 32 MULTIPOINT ((437475.3 4690926), (445710.4 4699311…
# Verify the Source in the category 'Other' of Type.
features %>%
filter(Type == "Other") %>%
group_by(Source) %>%
tally()Simple feature collection with 2 features and 2 fields
Geometry type: MULTIPOINT
Dimension: XY
Bounding box: xmin: 426968.4 ymin: 4647347 xmax: 496032.5 ymax: 4700030
Projected CRS: WGS 84 / UTM zone 35N
# A tibble: 2 × 3
Source n geometry
<chr> <int> <MULTIPOINT [m]>
1 Legacy verification 30 ((426968.4 4693384), (462171.2 4655463), (462343 46…
2 Survey 41 ((462452.2 4662370), (463837.3 4664726), (463869.9 …
In the tally of “Other” types, there are 34 Legacy verification features and 41 Survey features in the later monitoring and 30 LgV and 41 Survey in the earlier monitoring. While the latter are expected, the 34 verified features required follow up on 27 Dec 2022. Inspection showed that many of the verified features originate not from sunbursts but other map markers, such as rayed squares and triangles. We inspected these in the early seasons but stopped doing so after these turned out not to lead to mounds reliably.
What is hiding under “other”?
features %>%
filter(Type == "Other" & Source == "Legacy verification") %>%
group_by(PrincipalSourceOfImpact) %>%
tally()Simple feature collection with 13 features and 2 fields
Geometry type: GEOMETRY
Dimension: XY
Bounding box: xmin: 426968.4 ymin: 4647347 xmax: 496032.5 ymax: 4693384
Projected CRS: WGS 84 / UTM zone 35N
# A tibble: 13 × 3
PrincipalSourceOfImpact n geometry
<chr> <int> <GEOMETRY [m]>
1 Agriculture 3 MULTIPOINT ((482322.1 46…
2 Construction 5 MULTIPOINT ((467436.7 46…
3 Erosion 2 MULTIPOINT ((462171.2 46…
4 No observation 8 MULTIPOINT ((462907.7 46…
5 Other 4 MULTIPOINT ((474493 4669…
6 Other (Geodetic point ) 1 POINT (470009.1 4671929)
7 Other (Military activity ) 1 POINT (471560.1 4669909)
8 Other (Military construction and geodetic ma… 1 POINT (472794.2 4671237)
9 Other (geodetic marker ) 1 POINT (469472.6 4669824)
10 Other (geodetic marker) 1 POINT (463869.9 4659734)
11 Other (military activity and geodetic marker… 1 POINT (478912 4682704)
12 Post-depositional 1 POINT (426968.4 4693384)
13 Urban development 1 POINT (491526.2 4676393)
We can see that a lot of the map symbols led to geodetic points and benchmarks, 8 were not accessible, and other locations contained water wells and other structures, whose setting prevented us from labelling them as extinct mounds
Validation II: Check for attribute duplicates
Spatial duplication is addressed in 06_Enrich.Rmd
numeric(0)
Validation III: Clean up condition, height and other attribute
Condition was documented on Likert scale from 1 to 5 with 1 being
pristine and 5 being extinct. 0 denoted No observation.
[1] "3" "2" "4" "5" "1" "6" "0"
Rows: 1,484
Columns: 24
$ TopoID <dbl> 200013, 200016, 200018, 200019, 200020, 200022…
$ Note <chr> "Diadopetkova mogila", NA, NA, NA, NA, NA, NA,…
$ TRAP <dbl> 9305, 9411, 9416, 9417, 9409, 9314, 9315, 9415…
$ Source <chr> "Legacy verification", "Survey", "Legacy verif…
$ Type <chr> "Burial Mound", "Burial Mound", "Burial Mound"…
$ LU_Around <chr> "Scrub", "Scrub", "Annual agriculture", "Pastu…
$ LU_Top <chr> "Scrub", "Scrub", "Annual agriculture", "Pastu…
$ Date <date> 2010-11-06, 2010-11-07, 2010-11-07, 2010-11-0…
$ DiameterMax <chr> "25", "15", "20", "15", "30", "30", "20", "25"…
$ HeightMax <dbl> 1.8, 1.5, 2.0, 2.0, 2.0, 2.0, 1.0, 1.0, 6.0, 1…
$ Condition <chr> "3", "2", "3", "4", "2", "5", "2", "5", "2", "…
$ PrincipalSourceOfImpact <chr> NA, "Nodata", "Agriculture", "Looting", "Agric…
$ AllNotes <chr> "stones of the top", "", "part of the north s…
$ distBG <dbl> 68354.39, 61029.50, 62182.78, 62264.91, 61452.…
$ distTown <dbl> 9730.7864, 1806.6579, 1303.8512, 1276.7345, 14…
$ distTownBoundary <dbl> 635.3572, 1806.6579, 1303.8512, 1276.7345, 148…
$ ElevAster_m <dbl> NA, 242.0523, 187.9270, 185.9483, 215.0000, 22…
$ SlopeAster_degrees <dbl> NA, 3.527693e+00, 3.088841e+00, 3.120401e+00, …
$ AspectAster <dbl> NA, 133.30900, 57.23311, 53.89560, 53.73102, 1…
$ TRI <dbl> NA, 1.29536438, 1.18449783, 1.00713158, 0.1759…
$ TPI <dbl> NA, 5.708199e-01, -2.686462e-01, -9.536743e-06…
$ Roughness <dbl> NA, 4.6093445, 3.8489990, 4.0048981, 0.8009949…
$ prom250mbuff <dbl> NA, 7.17, 43.48, 12.04, 11.15, 67.27, 77.14, 1…
$ geometry <POINT [m]> POINT (441126.7 4710087), POINT (453727.…
features <- features %>%
dplyr::mutate(Condition = case_when(Condition == 0 ~ "NA", Condition == 6 ~ "5",
Condition != 0 ~ Condition)) %>%
dplyr::mutate(Condition = as.factor(Condition)) %>%
dplyr::mutate(TypeCertainty = case_when(grepl("\\?", Type) ~ "Uncertain", !grepl("\\?",
Type) ~ "Certain")) %>%
dplyr::mutate(Type = gsub("\\?", "", Type)) %>%
dplyr::mutate(DiameterMax = as.numeric(DiameterMax))
levels(features$Condition) = c(1, 2, 3, 4, 5, NA)
features %>%
group_by(TypeCertainty) %>%
tally()Simple feature collection with 2 features and 2 fields
Geometry type: MULTIPOINT
Dimension: XY
Bounding box: xmin: 420515 ymin: 4647347 xmax: 499612.3 ymax: 4722257
Projected CRS: WGS 84 / UTM zone 35N
# A tibble: 2 × 3
TypeCertainty n geometry
<chr> <int> <MULTIPOINT [m]>
1 Certain 1382 ((420515 4693580), (420707.1 4690851), (420832.5 4689836)…
2 Uncertain 102 ((422391.7 4698832), (429156.7 4695478), (429294.9 469564…
Constrain to Yambol Province
Reading layer `YamRegion' from data source
`C:\Users\Adela\Documents\RStudio\MoundMerging2023\input_data\Vectors\YamRegion.shp'
using driver `ESRI Shapefile'
Simple feature collection with 1 feature and 7 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 431400.8 ymin: 4641684 xmax: 504849.3 ymax: 4727299
Projected CRS: WGS 84 / UTM zone 35N
Simple feature collection with 1260 features and 24 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 433010.4 ymin: 4647347 xmax: 499612.3 ymax: 4722257
Projected CRS: WGS 84 / UTM zone 35N
First 10 features:
TopoID Note TRAP Source Type LU_Around
2 200016 <NA> 9411 Survey Burial Mound Scrub
3 200018 <NA> 9416 Legacy verification Burial Mound Annual agriculture
4 200019 <NA> 9417 Legacy verification Burial Mound Pasture
LU_Top Date DiameterMax HeightMax Condition
2 Scrub 2010-11-07 15 1.5 2
3 Annual agriculture 2010-11-07 20 2.0 3
4 Pasture 2010-11-07 15 2.0 4
PrincipalSourceOfImpact
2 Nodata
3 Agriculture
4 Looting
AllNotes distBG
2 61029.50
3 part of the north slope was ploughted 62182.78
4 S part was mainly ploughted, 50 % of the mound was destroyed 62264.91
distTown distTownBoundary ElevAster_m SlopeAster_degrees AspectAster TRI
2 1806.658 1806.658 242.0523 3.527693 133.30900 1.295364
3 1303.851 1303.851 187.9270 3.088841 57.23311 1.184498
4 1276.735 1276.735 185.9483 3.120401 53.89560 1.007132
TPI Roughness prom250mbuff TypeCertainty geometry
2 5.708199e-01 4.609344 7.17 Certain POINT (453727.1 4706737)
3 -2.686462e-01 3.848999 43.48 Certain POINT (452999.1 4707739)
4 -9.536743e-06 4.004898 12.04 Certain POINT (452997 4707823)
[ reached 'max' / getOption("max.print") -- omitted 7 rows ]
Within Yambol, we documented 1243 features (early observations).
Execute attribute changes following manual review
This chunk corrects poor data entries in the structured digital forms on the basis of photographs, diaries, and AKB records.
# Type Certainty UNCERTAIN TO CERTAIN
certain <- Y_features %>%
filter(TRAP %in% c(9838,9852,8766,8359)) %>%
mutate(TypeCertainty = "Certain" )
# Type Burial mound to EXTINCT
extinct <- Y_features %>%
filter(TRAP %in% c(9516, 8337, 8388, 8770, 9910, 9911)) %>%
mutate(Type = "Extinct Burial Mound" )
# Type Burial mound to Scatter
scatter <- Y_features %>%
filter(TRAP == 9883) %>%
mutate(Type = "Surface Scatter" )
# Type Burial mound to Other
other <- Y_features %>%
filter(TRAP %in% c(8427, 9645))%>%
mutate(Type = "Other")
# Swap Source in these two and TypeCertainty
m8763 <- Y_features %>%
filter(TRAP == 8763) %>%
mutate(Source = "Legacy verification")
m8764 <- Y_features %>%
filter(TRAP == 8764) %>%
mutate(Source = "Survey", Type = "Extinct Burial Mound", TypeCertainty = "Uncertain")
fixes <- rbind(m8763,m8764, other,scatter,extinct,certain)
class(fixes$distBG)[1] "numeric"
[1] "TopoID" "Note"
[3] "TRAP" "Source"
[5] "Type" "LU_Around"
[7] "LU_Top" "Date"
[9] "DiameterMax" "HeightMax"
[11] "Condition" "PrincipalSourceOfImpact"
[13] "AllNotes" "distBG"
[15] "distTown" "distTownBoundary"
[17] "ElevAster_m" "SlopeAster_degrees"
[19] "AspectAster" "TRI"
[21] "TPI" "Roughness"
[23] "prom250mbuff" "TypeCertainty"
[25] "geometry"
[1] "TopoID" "Note"
[3] "TRAP" "Source"
[5] "Type" "LU_Around"
[7] "LU_Top" "Date"
[9] "DiameterMax" "HeightMax"
[11] "Condition" "PrincipalSourceOfImpact"
[13] "AllNotes" "distBG"
[15] "distTown" "distTownBoundary"
[17] "ElevAster_m" "SlopeAster_degrees"
[19] "AspectAster" "TRI"
[21] "TPI" "Roughness"
[23] "prom250mbuff" "TypeCertainty"
[25] "geometry"
# row_update() fails on the units distBG and geometry columns so we eliminate these for a moment to apply fixes, and then rejoin them.
features_fixed <- rows_update(
#remove distBG
st_drop_geometry(Y_features)[,-14],
st_drop_geometry(fixes)[,-14],
by = "TRAP")
# rejoin geometry and distBG to cleaned source and type data
features_fixed <- features_fixed %>%
left_join(Y_features[,c("TRAP", "distBG")], by = "TRAP") %>%
st_as_sf()
features_fixed %>% mapview(zcol = "Type")Simple feature collection with 4 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 465676.3 ymin: 4662406 xmax: 483400.7 ymax: 4697154
Projected CRS: WGS 84 / UTM zone 35N
Date TRAP Source geometry
1 2018-09-20 9802 Survey POINT (483395.3 4665848)
2 2022-09-13 9825 Legacy verification POINT (467848.2 4662406)
3 2022-09-22 8777 Legacy verification POINT (483400.7 4697154)
4 2022-09-28 9914 Legacy verification POINT (465676.3 4675875)
Fix missing data notation
AllNotes field and TopoID field sometimes contain zeroes or are outright blank to indicate missing data. Missing data is legitimate, as only Map symbols get TopoID and many mounds were first encountered during Survey. Likewise, AllNotes field is aggregated from annotations in the mobile app, which served to comment on structured data (dropdown options), indicating some deviation or addition. They do not exist everywhere. Let’s convert all these zeroes and blanks to NAs for consistency.
features_fixed$AllNotes <- ifelse(features_fixed$AllNotes == "0" | features_fixed$AllNotes ==
"", NA, features_fixed$AllNotes)
features_fixed$TopoID <- ifelse(features_fixed$TopoID == "0", NA, features_fixed$TopoID)Clean up after the changes
AKB identifiers
Here we add AKB numbers to feature data so their future analyses can be brought to bear on the environmental attributes documented during survey.
1055 AKB records
We have 1055 AKB numbers among 1466 visited features. The mismatch is because some features were not deemed worthy of registering (extinct or overbuilt status, military bunkers. etc)
[1] 1055
[1] "TRAP" "AKB" "Area" "Notes"
# A tibble: 6 × 4
TRAP AKB Area Notes
<dbl> <dbl> <chr> <chr>
1 8048 2700001 Borisovo <NA>
2 9200 2700063 Yambol <NA>
3 8356 2700064 Yambol <NA>
4 8351 2700065 Yambol second TRAP number 9227
5 9227 2700065 Yambol second TRAP number 8351
6 8353 2700066 Yambol second TRAP number 9257
# A tibble: 29 × 4
TRAP AKB Area Notes
<dbl> <dbl> <chr> <chr>
1 8047 2700149 Borisovo Excavated by Neli Tancheva
2 8043 2700151 Borisovo Excavated by Daniela Agre
3 8348 2700220 Mogila Excavated by Piotr - Rome
4 8357 2700221 Mogila Excavated by Volker EBA
5 8345 2700222 Mogila Excavated by Ilia EBA
6 8349 2700226 Mogila Excavated by Piotr - Rome
7 6009 2700287 Boyanovo Excavated by Ilia Iliev EBA
8 9357 2700289 Boyanovo Excavated by Daniela Agre EBA
9 7004 2790017 Boyanovo Excavated by Ilia Iliev
10 7005 2790018 Boyanovo Excavated by Daniela Agre
# ℹ 19 more rows
# A tibble: 2 × 4
TRAP AKB Area Notes
<dbl> <dbl> <chr> <chr>
1 9962 10009821 Pravdino <NA>
2 9962 10009824 Pravdino Feature 2 on photos
[1] 965 968
Join AKB to features
Y_features <- features_fixed %>%
left_join(AKB, by = c("TRAP")) %>%
rename(AKBNotes = Notes)
mapview(Y_features, zcol = "AKBNotes") [1] "TopoID" "Note"
[3] "TRAP" "Source"
[5] "Type" "LU_Around"
[7] "LU_Top" "Date"
[9] "DiameterMax" "HeightMax"
[11] "Condition" "PrincipalSourceOfImpact"
[13] "AllNotes" "distTown"
[15] "distTownBoundary" "ElevAster_m"
[17] "SlopeAster_degrees" "AspectAster"
[19] "TRI" "TPI"
[21] "Roughness" "prom250mbuff"
[23] "TypeCertainty" "distBG"
[25] "AKB" "Area"
[27] "AKBNotes" "geometry"
Simple feature collection with 6 features and 18 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 452997 ymin: 4706447 xmax: 454596 ymax: 4707823
Projected CRS: WGS 84 / UTM zone 35N
HeightMax Condition PrincipalSourceOfImpact
1 1.5 2 Nodata
2 2.0 3 Agriculture
3 2.0 4 Looting
AllNotes distTown
1 <NA> 1806.658
2 part of the north slope was ploughted 1303.851
3 S part was mainly ploughted, 50 % of the mound was destroyed 1276.735
distTownBoundary ElevAster_m SlopeAster_degrees AspectAster TRI
1 1806.658 242.0523 3.527693 133.30900 1.295364
2 1303.851 187.9270 3.088841 57.23311 1.184498
3 1276.735 185.9483 3.120401 53.89560 1.007132
TPI Roughness prom250mbuff TypeCertainty distBG AKB
1 5.708199e-01 4.609344 7.17 Certain 61029.50 10001492
2 -2.686462e-01 3.848999 43.48 Certain 62182.78 10001486
3 -9.536743e-06 4.004898 12.04 Certain 62264.91 10001487
Area AKBNotes geometry
1 Drazhevo <NA> POINT (453727.1 4706737)
2 Hadzhidimitrovo <NA> POINT (452999.1 4707739)
3 Hadzhidimitrovo <NA> POINT (452997 4707823)
[ reached 'max' / getOption("max.print") -- omitted 3 rows ]
# Are there duplicate AKB numbers?
AKBduplicated <- Y_features %>%
filter(!is.na(AKB)) %>%
filter(duplicated(AKB)) %>%
pull(AKB)
Y_features %>%
dplyr::filter(AKB %in% AKBduplicated) %>%
dplyr::select(TRAP, AKB)Simple feature collection with 34 features and 2 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 462153.1 ymin: 4661331 xmax: 492400 ymax: 4706973
Projected CRS: WGS 84 / UTM zone 35N
First 10 features:
TRAP AKB geometry
1 9222 2700226 POINT (464359.3 4706914)
2 9225 10007352 POINT (464401.3 4706968)
3 9226 10007350 POINT (464230 4706932)
4 9220 2700221 POINT (465196.3 4706385)
5 9258 2700067 POINT (462161.1 4705892)
6 9257 2700066 POINT (462153.1 4705829)
7 9216 2700216 POINT (467217.8 4706895)
8 9365 10001274 POINT (474347.1 4671482)
9 9223 2700220 POINT (464354.2 4706928)
10 9224 10007351 POINT (464404.7 4706951)
Beware: In 8 instances [10007351 10007352 2700220 2700226 10007031 10009627 10009305 10001274 10009811], AKB numbers are duplicated, meaning that 17 TRAP mounds share an AKB number.
How many features/mounds have been excavated in Yambol by 2023
Simple feature collection with 28 features and 27 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 447232.5 ymin: 4662878 xmax: 493995.2 ymax: 4715727
Projected CRS: WGS 84 / UTM zone 35N
First 10 features:
TopoID Note TRAP Source Type LU_Around LU_Top
1 200029 <NA> 9431 Legacy verification Burial Mound Pasture Pasture
2 200080 <NA> 9430 Legacy verification Burial Mound Pasture Pasture
Date DiameterMax HeightMax Condition PrincipalSourceOfImpact
1 2010-11-10 20 2 5 Looting
2 2010-11-10 10 NA 5 Looting
AllNotes distTown
1 one big trench 10x10 1006.1010
2 one big trench 9x6, totally destroyed, check excavation report 966.9691
distTownBoundary ElevAster_m SlopeAster_degrees AspectAster TRI
1 1006.1010 144.0196 3.487972 353.33450 1.409735
2 966.9691 147.0415 3.966855 28.08201 1.574440
TPI Roughness prom250mbuff TypeCertainty distBG AKB Area
1 0.7014713 4.790833 83.09 Certain 64932.45 10001377 Drazhevo
2 0.6184120 5.340210 29.14 Certain 64845.70 10001376 Drazhevo
AKBNotes geometry
1 Excavated by Ilia Iliev POINT (455348.3 4711125)
2 Excavated by Ilia Iliev POINT (455447 4711057)
[ reached 'max' / getOption("max.print") -- omitted 8 rows ]
Simple feature collection with 1047 features and 27 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 433010.4 ymin: 4647347 xmax: 499612.3 ymax: 4722135
Projected CRS: WGS 84 / UTM zone 35N
First 10 features:
TopoID Note TRAP Source Type LU_Around
1 200016 <NA> 9411 Survey Burial Mound Scrub
2 200018 <NA> 9416 Legacy verification Burial Mound Annual agriculture
LU_Top Date DiameterMax HeightMax Condition
1 Scrub 2010-11-07 15 1.5 2
2 Annual agriculture 2010-11-07 20 2.0 3
PrincipalSourceOfImpact AllNotes distTown
1 Nodata <NA> 1806.658
2 Agriculture part of the north slope was ploughted 1303.851
distTownBoundary ElevAster_m SlopeAster_degrees AspectAster TRI
1 1806.658 242.0523 3.527693 133.30900 1.295364
2 1303.851 187.9270 3.088841 57.23311 1.184498
TPI Roughness prom250mbuff TypeCertainty distBG AKB
1 0.5708199 4.609344 7.17 Certain 61029.50 10001492
2 -0.2686462 3.848999 43.48 Certain 62182.78 10001486
Area AKBNotes geometry
1 Drazhevo <NA> POINT (453727.1 4706737)
2 Hadzhidimitrovo <NA> POINT (452999.1 4707739)
[ reached 'max' / getOption("max.print") -- omitted 8 rows ]
Fix column names for legibility
[1] "TopoID" "Note"
[3] "TRAP" "Source"
[5] "Type" "LU_Around"
[7] "LU_Top" "Date"
[9] "DiameterMax" "HeightMax"
[11] "Condition" "PrincipalSourceOfImpact"
[13] "AllNotes" "distTown"
[15] "distTownBoundary" "ElevAster_m"
[17] "SlopeAster_degrees" "AspectAster"
[19] "TRI" "TPI"
[21] "Roughness" "prom250mbuff"
[23] "TypeCertainty" "distBG"
[25] "AKB" "Area"
[27] "AKBNotes" "geometry"
Mark duplicates
A number of features were visited multiple times and registered under different numbers. Spatial duplicates and triplicates are mounds that share the same location (often are within 15 m of one another) but have unique ID.
We have two lists of matching TRAP IDs for 2009-2010 mounds and their post-2010 duplicates.
# Early records
upto2010 <- c(6011, 8022:8025, 8028, 8029, 8030, 8035, 8350:8353, 8357, 8359, 8434,
8669, 9077)
# Later records
post2010 <- c(9357, 9594, 9595, 9593, 9596, 9592, 9591, 9358, 8202, 9226, 9227, 9258,
9257, 9220, 9219, 9216, 9740, 9715)
# to see the pairs, they are collated in output data folder
duplicates <- read.csv("../output_data/interim/duplicates_final.txt", sep = " ")In order to highlight the duplicates and allow users to filter the
early or later variant (2009-2010 or post-2010), we create a
SpatialDuplicate column, declare the paired TRAP ID in the
PairedID column, and create a Pre2010Variant
and Post2010Variant columns to facilitate filtering of the
early and later version.
# Label the spatial versions/duplicates as such
Y_features$SpatialDuplicate <- ifelse(Y_features$TRAP %in% duplicates$upto2010 |
Y_features$TRAP %in% duplicates$post2010, "Duplicate", "Unique")Add PairedID column and fill in with other version TRAP
id where applicable
# Initialize PairedID column with NA
Y_features$PairedID <- NA
# For upto2010 records, use the TRAP id in post2010 column of duplicates
Y_features$PairedID[Y_features$TRAP %in% duplicates$upto2010] <- duplicates$post2010[match(Y_features$TRAP[Y_features$TRAP %in%
duplicates$upto2010], duplicates$upto2010)]
# For post2010 records, use TRAP id from upto2010 from duplicates
Y_features$PairedID[Y_features$TRAP %in% duplicates$post2010] <- duplicates$upto2010[match(Y_features$TRAP[Y_features$TRAP %in%
duplicates$post2010], duplicates$post2010)]
Y_features %>%
st_drop_geometry() %>%
dplyr::select(TRAP, PairedID, SpatialDuplicate) %>%
filter(SpatialDuplicate == "Duplicate") TRAP PairedID SpatialDuplicate
1 9226 8350 Duplicate
2 9220 8357 Duplicate
3 9258 8352 Duplicate
4 9257 8353 Duplicate
5 9216 8434 Duplicate
6 9358 8030 Duplicate
7 9227 8351 Duplicate
8 9219 8359 Duplicate
9 9357 6011 Duplicate
10 6011 9357 Duplicate
11 8035 8202 Duplicate
12 8022 9594 Duplicate
13 8024 9593 Duplicate
14 8023 9595 Duplicate
15 8025 9596 Duplicate
16 8028 9592 Duplicate
17 8029 9591 Duplicate
18 8030 9358 Duplicate
19 8350 9226 Duplicate
20 8351 9227 Duplicate
21 8352 9258 Duplicate
22 8353 9257 Duplicate
23 8357 9220 Duplicate
24 8359 9219 Duplicate
25 9591 8029 Duplicate
[ reached 'max' / getOption("max.print") -- omitted 10 rows ]
Create two columns Upto2010 and Post2010 to
facilitate filtering by different versions.
`%nin%` <- Negate(`%in%`)
Y_features$Upto2010 <- ifelse(Y_features$TRAP %nin% duplicates$post2010, "Yes", "No")
Y_features$Post2010 <- ifelse(Y_features$TRAP %nin% duplicates$upto2010, "Yes", "No")
# review
Y_features %>%
st_drop_geometry() %>%
dplyr::select(TRAP, PairedID, SpatialDuplicate, Upto2010, Post2010) %>%
filter(SpatialDuplicate == "Duplicate") TRAP PairedID SpatialDuplicate Upto2010 Post2010
1 9226 8350 Duplicate No Yes
2 9220 8357 Duplicate No Yes
3 9258 8352 Duplicate No Yes
4 9257 8353 Duplicate No Yes
5 9216 8434 Duplicate No Yes
6 9358 8030 Duplicate No Yes
7 9227 8351 Duplicate No Yes
8 9219 8359 Duplicate No Yes
9 9357 6011 Duplicate No Yes
10 6011 9357 Duplicate Yes No
11 8035 8202 Duplicate Yes No
12 8022 9594 Duplicate Yes No
13 8024 9593 Duplicate Yes No
14 8023 9595 Duplicate Yes No
15 8025 9596 Duplicate Yes No
[ reached 'max' / getOption("max.print") -- omitted 20 rows ]
Export Yambol features
To export these properly, select your desired version to run the
script with either the early or the later dataset. Also, mke sure to
select the best format for your further processing: rds(R) or
geojson(Python). Default is the _early version and
rds format.
[1] "TopoID" "TopoNote"
[3] "TRAP" "Source"
[5] "Type" "LanduseAround"
[7] "LanduseOnTop" "Date"
[9] "DiameterMax_m" "HeightMax_m"
[11] "Condition" "PrincipalSourceOfImpact"
[13] "AllNotes" "DistTown_m"
[15] "DistTownBoundary_m" "ElevAster_m"
[17] "SlopeAster_degrees" "AspectAster"
[19] "TRI" "TPI"
[21] "Roughness" "Prominence_at250m"
[23] "TypeCertainty" "DistBG_m"
[25] "AKB" "AKBArea"
[27] "AKBNotes" "geometry"
[29] "SpatialDuplicate" "PairedID"
[31] "Upto2010" "Post2010"
Rows: 1,260
Columns: 32
$ TopoID <dbl> 200016, 200018, 200019, 200020, 200022, 200023…
$ TopoNote <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ TRAP <dbl> 9411, 9416, 9417, 9409, 9314, 9315, 9415, 9418…
$ Source <chr> "Survey", "Legacy verification", "Legacy verif…
$ Type <chr> "Burial Mound", "Burial Mound", "Burial Mound"…
$ LanduseAround <chr> "Scrub", "Annual agriculture", "Pasture", "Ann…
$ LanduseOnTop <chr> "Scrub", "Annual agriculture", "Pasture", "Pas…
$ Date <date> 2010-11-07, 2010-11-07, 2010-11-07, 2010-11-0…
$ DiameterMax_m <dbl> 15, 20, 15, 30, 30, 20, 25, 50, 15, 20, 10, 23…
$ HeightMax_m <dbl> 1.5, 2.0, 2.0, 2.0, 2.0, 1.0, 1.0, 6.0, 1.0, 2…
$ Condition <fct> 2, 3, 4, 2, 5, 2, 5, 2, 3, 5, 5, 5, 5, 5, 1, 5…
$ PrincipalSourceOfImpact <chr> "Nodata", "Agriculture", "Looting", "Agricultu…
$ AllNotes <chr> NA, "part of the north slope was ploughted", …
$ DistTown_m <dbl> 1806.65789, 1303.85117, 1276.73451, 1486.64146…
$ DistTownBoundary_m <dbl> 1806.65789, 1303.85117, 1276.73451, 1486.64146…
$ ElevAster_m <dbl> 242.0523, 187.9270, 185.9483, 215.0000, 227.64…
$ SlopeAster_degrees <dbl> 3.5276933, 3.0888411, 3.1204007, 0.5485047, 3.…
$ AspectAster <dbl> 133.3090024, 57.2331062, 53.8955957, 53.731018…
$ TRI <dbl> 1.2953644, 1.1844978, 1.0071316, 0.1759815, 1.…
$ TPI <dbl> 5.708199e-01, -2.686462e-01, -9.536743e-06, 1.…
$ Roughness <dbl> 4.6093445, 3.8489990, 4.0048981, 0.8009949, 4.…
$ Prominence_at250m <dbl> 7.17, 43.48, 12.04, 11.15, 67.27, 77.14, 17.58…
$ TypeCertainty <chr> "Certain", "Certain", "Certain", "Certain", "C…
$ DistBG_m <dbl> 61029.50, 62182.78, 62264.91, 61452.66, 60782.…
$ AKB <dbl> 10001492, 10001486, 10001487, 10001371, 100014…
$ AKBArea <chr> "Drazhevo", "Hadzhidimitrovo", "Hadzhidimitrov…
$ AKBNotes <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, "Excavated…
$ geometry <POINT [m]> POINT (453727.1 4706737), POINT (452999.…
$ SpatialDuplicate <chr> "Unique", "Unique", "Unique", "Unique", "Uniqu…
$ PairedID <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ Upto2010 <chr> "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes…
$ Post2010 <chr> "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes…
# Export
Y_features %>%
write_rds("../output_data/Y_features.rds")
Y_features %>%
st_write("../output_data/Y_features.geojson", append = F)Deleting layer not supported by driver `GeoJSON'
Deleting layer `Y_features' failed
Writing layer `Y_features' to data source
`../output_data/Y_features.geojson' using driver `GeoJSON'
Updating existing layer Y_features
Writing 1260 features with 31 fields and geometry type Point.
Filter mounds
Now that the attributes look reasonably well, let’s filter out and export the most likely mounds inside the Yambol Province.
Simple feature collection with 4 features and 2 fields
Geometry type: MULTIPOINT
Dimension: XY
Bounding box: xmin: 433010.4 ymin: 4647347 xmax: 499612.3 ymax: 4722135
Projected CRS: WGS 84 / UTM zone 35N
# A tibble: 4 × 3
Type n geometry
<chr> <int> <MULTIPOINT [m]>
1 Burial Mound 927 ((433010.4 4694365), (433295.2 4693180), (433954.3…
2 Extinct Burial Mound 168 ((433034.9 4693186), (433441.6 4692413), (435565.5…
3 Other 72 ((445501 4697842), (462171.2 4655463), (462313 468…
4 Uncertain Feature 31 ((445710.4 4699311), (457010.8 4671808), (462076.8…
Export Yambol mounds
Moving to export, .rds, .geojson, and .csv are generated below. For
reuse, remember to filter for either the Upto2010 or
Post2010 variant. The latter is perhaps more authoritative
as the later observations are more “current”. But there is good reason
to use the former variant if you need to add AKB information or both
variants together if you wish to look for change in observations between
teams and seasons.
Y_features %>%
filter(grepl("[Mm]ound", Type)) %>%
write_rds("../output_data/Y_mounds.rds")
Y_features %>%
filter(grepl("[Mm]ound", Type)) %>%
st_write("../output_data/Y_mounds.geojson", append = FALSE)Deleting layer not supported by driver `GeoJSON'
Deleting layer `Y_mounds' failed
Writing layer `Y_mounds' to data source
`../output_data/Y_mounds.geojson' using driver `GeoJSON'
Updating existing layer Y_mounds
Writing 1095 features with 31 fields and geometry type Point.
Y_features %>%
filter(grepl("[Mm]ound", Type)) %>%
mutate(X = st_coordinates(.)[, 2], Y = st_coordinates(.)[, 1]) %>%
mutate(DistBG_m = as.vector(unclass(DistBG_m)), DistTown_m = unclass(DistTown_m),
DistTownBoundary_m = unclass(DistTownBoundary_m)) %>%
st_drop_geometry() %>%
write_csv("../output_data/Y_mounds.csv")